visual place recognition
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology (0.46)
- Transportation > Ground > Road (0.46)
EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing
Visual Place Recognition (VPR) is essential for mobile robots as it enables them to retrieve images from a database closest to their current location. The progress of Visual Foundation Models (VFMs) has significantly advanced VPR by capturing representative descriptors in images. However, existing fine-tuning efforts for VFMs often overlook the crucial role of probing in effectively adapting these descriptors for improved image representation. In this paper, we propose the Centroid-Free Probing (CFP) stage, making novel use of second-order features for more effective use of descriptors from VFMs. Moreover, to control the preservation of task-specific information adaptively based on the context of the VPR, we introduce the Dynamic Power Normalization (DPN) module in both the recalibration and CFP stages, forming a novel Parameter Efficiency Fine-Tuning (PEFT) pipeline (EMVP) tailored for the VPR task. Extensive experiments demonstrate the superiority of the proposed CFP over existing probing methods. Moreover, the EMVP pipeline can further enhance fine-tuning performance in terms of accuracy and efficiency.
Multiview Scene Graph
A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility graphs in SfM. In this work, we propose to build Multiview Scene Graphs (MSG) from unposed images, representing a scene topologically with interconnected place and object nodes. The task of building MSG is challenging for existing representation learning methods since it needs to jointly address both visual place recognition, object detection, and object association from images with limited fields of view and potentially large viewpoint changes. To evaluate any method tackling this task, we developed an MSG dataset and annotation based on a public 3D dataset. We also propose an evaluation metric based on the intersection-over-union score of MSG edges. Moreover, we develop a novel baseline method built on mainstream pretrained vision models, combining visual place recognition and object association into one Transformer decoder architecture. Experiments demonstrate that our method has superior performance compared to existing relevant baselines.
Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics
Visual place recognition (VPR) is an important component technology for camera-based mapping and navigation applications. This is a challenging problem because images of the same place may appear quite different for reasons including seasonal changes, weather illumination, structural changes to the environment, as well as transient pedestrian or vehicle traffic. Papers focusing on generating image descriptors for VPR report their results using metrics such as recall@K and ROC curves. However, for a robot implementation, determining which matches are sufficiently good is often reduced to a manually set threshold. And it is difficult to manually select a threshold that will work for a variety of visual scenarios. This paper addresses the problem of automatically selecting a threshold for VPR by looking at the 'negative' Gaussian mixture statistics for a place - image statistics indicating not this place. We show that this approach can be used to select thresholds that work well for a variety of image databases and image descriptors.
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Czechia > Prague (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology (0.46)
- Health & Medicine (0.46)
Going Places: Place Recognition in Artificial and Natural Systems
Milford, Michael, Fischer, Tobias
Place recognition--the process of an animal, person or robot recognizing a familiar location in the world--has attracted significant attention across multiple disciplines. In animals, this capability has evolved over millions of years through sophisticated neural mechanisms: hippocampal place cells fire at specific spatial locations (1), entorhinal grid cells provide spatial coordinates through hexagonal firing patterns (2), while diverse species demonstrate remarkable navigation--from desert ants using celestial cues and visual panoramas (3) to migratory birds returning to precise breeding sites across hemispheric distances (4). Humans extend these biological foundations with unique cognitive abilities, recognizing places not only through sensory perception but also through semantic meaning, emotional associations, and cultural context--enabling us to identify familiar locations from descriptions, memories, or even fictional narratives (5). In artificial systems, place recognition underpins core robotics functions such as localization, mapping, and long-term autonomy, developing into a mature field that, while sometimes inspired by biological principles, often diverges significantly in implementation to optimize for computational efficiency and metric accuracy. As research has grown in the area, so too has a rich landscape of surveys and reviews that reflect the field's evolution and diversification.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Oceania > Australia > Queensland > Brisbane (0.04)
- (4 more...)
- Research Report (0.81)
- Overview (0.66)
Scaling Image Geo-Localization to Continent Level
Lindenberger, Philipp, Sarlin, Paul-Edouard, Hosang, Jan, Balice, Matteo, Pollefeys, Marc, Lynen, Simon, Trulls, Eduard
Determining the precise geographic location of an image at a global scale remains an unsolved challenge. Standard image retrieval techniques are inefficient due to the sheer volume of images (>100M) and fail when coverage is insufficient. Scalable solutions, however, involve a trade-off: global classification typically yields coarse results (10+ kilometers), while cross-view retrieval between ground and aerial imagery suffers from a domain gap and has been primarily studied on smaller regions. This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent. We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information. We combine these learned prototypes with embeddings of aerial imagery to increase robustness to the sparsity of ground-level data. This enables direct, fine-grained retrieval over areas spanning multiple countries. Our extensive evaluation demonstrates that our approach can localize within 200m more than 68\% of queries of a dataset covering a large part of Europe. The code is publicly available at https://scaling-geoloc.github.io.
- Europe > Western Europe (0.04)
- Europe > Belgium (0.04)
- North America > United States > Massachusetts (0.04)
- (12 more...)
Through the Lens of Doubt: Robust and Efficient Uncertainty Estimation for Visual Place Recognition
Miller, Emily, Milford, Michael, Hafez, Muhammad Burhan, Ramchurn, SD, Ehsan, Shoaib
Visual Place Recognition (VPR) enables robots and autonomous vehicles to identify previously visited locations by matching current observations against a database of known places. However, VPR systems face significant challenges when deployed across varying visual environments, lighting conditions, seasonal changes, and viewpoints changes. Failure-critical VPR applications, such as loop closure detection in simultaneous localization and mapping (SLAM) pipelines, require robust estimation of place matching uncertainty. We propose three training-free uncertainty metrics that estimate prediction confidence by analyzing inherent statistical patterns in similarity scores from any existing VPR method. Similarity Distribution (SD) quantifies match distinctiveness by measuring score separation between candidates; Ratio Spread (RS) evaluates competitive ambiguity among top-scoring locations; and Statistical Uncertainty (SU) is a combination of SD and RS that provides a unified metric that generalizes across datasets and VPR methods without requiring validation data to select the optimal metric. All three metrics operate without additional model training, architectural modifications, or computationally expensive geometric verification. Comprehensive evaluation across nine state-of-the-art VPR methods and six benchmark datasets confirms that our metrics excel at discriminating between correct and incorrect VPR matches, and consistently outperform existing approaches while maintaining negligible computational overhead, making it deployable for real-time robotic applications across varied environmental conditions with improved precision-recall performance.
- Oceania > Australia > Queensland > Brisbane (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Hampshire > Southampton (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Czechia > Prague (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Information Technology (0.46)
- Health & Medicine (0.46)